The long reach of DNA sequence heterogeneity in diffusive processes 
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Many biological processes involve one dimensional diffusion over a correlated inhomogeneous 
energy landscape with a correlation length £ c . Typical examples are specific protein target location 
on DNA, nucleosome repositioning, or DNA translocation through a nanopore, in all cases with £ c ~ 
10 nm. We investigate such transport processes by the mean first passage time (MFPT) formalism, 
and find diffusion times which exhibit strong sample to sample fluctuations. For a a displacement 
N, the average MFPT is diffusive, while its standard deviation over the ensemble of energy profiles 
scales as N 3 ^ 2 with a large prefactor. Fluctuations are thus dominant for displacements smaller than 
a characteristic N c 3> £ c : typical values are much less than the mean, and governed by an anomalous 
diffusion rule. Potential biological consequences of such random walks, composed of rapid scans in 
the vicinity of favorable energy valleys and occasional jumps to further valleys, is discussed. 

PACS numbers: 87.10.+e, 87.14.Gg, 87.15.Vv, 05.40.Fb 



I. INTRODUCTION 

Diffusion appears in most basic processes in the liv- 
ing matter and therefore has been studied extensively 
by theoretical and experimental biophysicists for many 
decades. At the macroscopic scale, the phenomena are 
adequately described by continuum models that form a 
well established methodology finding many applications 
in science and technology yj. Advanced experimental 
methods, such as nanoprobing and single-molecule tech- 
niques, provide us with a wealth of data at the micro- 
scopic level. Theoretical description of the observed phe- 
nomena at such scales is often a considerable challenge, 
since many irregular features that average out on the 
macroscopic scale cannot be ignored anymore. Some- 
times, however, rather simple characteristics emerge, al- 
lowing for exact analytic treatment. 

One-dimensional (ID) transport is rarely found on the 
macroscopic scale; at the molecular level though, one 
can find several examples, e.g. kinesin motion along 
microtubules 3 or DNA translocation through a 

nanopore [ElElJlg- Usually, in such problems, the un- 
derlying potential profile is considered to be constant or 
at least regular. However, as we show in this paper, DNA 
sequence heterogeneity and the resulting random energy 
landscape can have a considerable influence on the dif- 
fusion up to biologically relevant length scales at room 
temperatures. 



A. Protein-DNA interaction 

The first example we study here arises in the context 
of protein-DNA interaction. As proposed by von Hippel 



and Berg 

HQ3, 

and recently observed in many systems 
[TT| . ID "sliding" of proteins along the DNA molecule is 
an important component of protein specific site location; 
at least in prokaryotes. The "sliding" is viewed as an un- 
biased, thermally activated process. The actual rules of 
motion for sliding depend on the details of interaction be- 
tween the protein and the DNA. The general belief is that 
there are two protein-DNA binding modes: a strong "spe- 
cific" mode that characterizes binding of operator sites, 
and a much weaker "non-specific" mode in which bind- 
ing of non-cognate DNA occurs 0, 0, 0, 0] . In the 
"non-specific" or "search" mode, the interaction energy is 
usually assumed to be independent of the DNA sequence 
that the protein is bound to, though not much exper- 
imental evidence beside relatively fast observed search 
times favors this strictly "equipotcntial" picture. On the 
other hand, scanning force microscopy experiments by 
Erie et. al. ylj| clearly demonstrate DNA bending by Cro 
repressor protein, both at operator and at non-operator 
sequences 39]. Since local DNA elasticity is known to 
be highly sequence-dependent |l6j | , the energy of protein 
bound at random locations should have a random com- 
ponent, correlated at length scales of the order of the 
protein binding domain size; see Fig. This sequence- 
dependent interaction energy component appears in ad- 
dition to possible local uncorrelated sequence-dependent 
contributions from amino acid - base pair contacts. 

To estimate the significance of the random component 
of the elastic energy, we use DNA elasticity data supplied 
by the BEND. IT server |T3|, that incorporates DNase 
I based bendability parameters |Ts| and the consensus 
bendability scale [19(. We assume that the protein-DNA 
complex in Fig. ^ has a fixed geometry, i.e. the protein 
is "hard." Then, the random component of the binding 
energy SU is proportional to the random component of 
the Young's modulus SE 
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FIG. 1: (a) Prokaryotic transcription factor sliding; (b) Nu- 
cleosome repositioning. 



where l v ~ 50 nm is the DNA persistence length, 9 ~ 60° 
is the curvature angle 0]> L = 10 — 20 bp is the bent 
sequence length and E ~ 3.4 x 10 8 N/m is the average 
Young's modulus. The resulting potential profile is plot- 
ted in Fig. The standard deviation of the random 
component is {{SU) 2 ) 1 ^ 2 ~ 0.5 — 1.5 k B T, so that disor- 
der appears to be relevant for this problem. 

Another interesting example, also from the field of 
protein-DNA interaction, was considered recently by 
Schiessel et. al. |2£j, and deals with nucleosome reposi- 
tioning by DNA reptation. It was argued that chromatin 
remodeling pH |22| can be readily understood in terms 
of intranucleosomal loop diffusion, the size of the loop 
resulting mainly from a compromise between elastic en- 
ergy and nucleosome-DNA binding energy. Here again, 
for a given size of the loop, the elastic energy is sequence- 
dependent |22|. and therefore has a random component 
with finite correlation length; see Fig.^Jj. For nucleosome 
repositioning, this effect may be even more pronounced 
than for prokaryotic protein-DNA interaction; the bend- 
ing angles 9 and the sequence lengths L are 2-3 times 
larger so that the net effect may be twice as strong as for 
the Cro repressor [20^. 

It is known that DNA can have an intrinsic curva- 
ture arising from the stacking interactions between base 
pairs. Such sequence-dependent curvature can play a role 
similar to sequence-dependent DNA bendability in pro- 
viding a correlated landscape. The bending energy of an 
intrinsically curved region is easier, requiring a smaller 
angular deformation 9 = ^complex — ^intrinsic by the DNA- 
protein complex. Such sequence-dependent intrinsic cur- 
vature was suggested to be involved in positioning nuclc- 
osomes [23). 

Aside from DNA bendability and curvature, local cor- 
relations in nucelotide composition, known to be present 
in eukaryotic genomes, (AT/GC-rich isochores) can re- 
sult in a correlated landscape of the protein-DNA binding 



energy. This effect becomes especially pronounced when 
a DNA-binding protein has a strong preference toward a 
particular AT/GC composition of its site. However, in 
this case, variations take place over much longer scales, 
and are not quantitatively relevant in the specific con- 
texts addressed in this paper. 
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FIG. 2: (a) Energy of local elastic deformation and (b) Po- 
tential profile correlator, as calculated from the data supplied 
by the server BEND. IT for a segment of E. coli genome. The 
deformed DNA sequence is assumed to be of length L = 15 bp. 

Both above examples can be viewed as specific cases 
of DNA reptation by means of a propagating defect (or 
"slack") of a fixed size. Elastic energy associated with 
the slack creation is sequence-dependent and correlated 
on the scale of the slack size. The propagating defect 
is well localized and samples the energies of well-defined 
subsequent DNA segments. As was pointed out by Cule 
and Hwa [24[ , short-range correlated randomness of this 
kind has no effect on the scaling of the reptation time. 
However, as we show below, the defect motion itself is 
strongly influenced by the disorder and has nontrivial 
behavior at different length scales. 

B. DNA translocation through a nanopore 

Consider a piece of single-stranded DNA (ssDNA) 
passing through a large membrane channel. If the po- 
tential difference across the membrane is zero, the mo- 
tion of the ssDNA is governed by thermal fluctuations. 
Since the channel width differs from the ssDNA exter- 
nal diameter only by few Angstroms |40| , it is reasonable 
that local interactions between the nucleotides and the 
amino acids of the channel take place. These interac- 
tions may have a local base-dependent component. In 
addition, longer-range terms are likely to appear in the 
presence of a voltage difference. In the cytoplasm, the 
DNA negative charge is almost completely screened out 
at distances of few nanometers by the counterion cloud. 
When the DNA molecule enters the pore, most of the 
counterions are likely to be "shaven off," though some of 
them may remain stuck to the DNA; see Fig-El Thus, the 
linear charge density inside the pore acquires a random 
and basically uncorrelated component: 

q(x) = q{x) + Sq(x), (6q{x) 5q(y)) = p 2 dS(x - y). 

(2) 
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The potential energy of the DNA segment inside the pore 
in the presence of a voltage difference of Vq is 



U(x) = -f / x'q(x')dx' 
h J x 



(3) 



qM 





FIG. 3: ssDNA transport through the nanopore; on the right: 
charge density q(x) and correlator g(r) = {[SU(x) — SU(x + 
r)] 2 ) / (2(SU 2 (x)}) as a function of the coordinate r. 

Since the average charge density q(x) is nonzero, DNA 
transport is driven by the average force Voq~(x)/h. The 
correlation function of the random component of U(x) is 
readily calculated to be 



(SU(x)SU(x+y)) = Y^(h-\y\f 



H(h-\y\), 



(4) 

where H(x) is the Heaviside function. Thus, the poten- 
tial profile for DNA motion has a random component 
with correlation length of h. Taking Vb ~ 100 mV, 
p ~ e/h (e is the elementary charge), h ~ 10 nm, we 
obtain SU ~ k B T. 

Although this example differs from the above ones in 
that a nonzero average driving force is present, large ran- 
dom fluctuations of the energy landscape may have sig- 
nificant effect on the distribution of translocation times 
- a problem that has attracted much interest lately p5|. 



II. DIFFUSION IN A RANDOM POTENTIAL 
A. The model 

The problems described above map onto a one- 
dimensional random walk with position-dependent hop- 
ping probabilities pi, qi = 1 — pi to the right and to the 
left, respectively; it is most natural to assume the regular 
activated transport form 



Pi oc e 



-P(Ui. 



-Ui) 



qi oc e 



-Vi) 



(5) 



where j3 = (kgT) and Ui is the sequence-dependent 
component of the potential energy. The latter is basically 



a sum of many random contributions and can therefore 
be considered to be normally distributed [l3). Thus, in 
the absence of correlations, the probability for realization 
of a certain profile U (x) of length L is (in the continuum 
limit) 



P[U(x)] oc exp 



-a / dx U\x) 



(6) 



This is the well-known Random-Energy Model |2(j that 
was applied successfully to various biophysical problems, 
from protein folding [2]) to protein-DNA interaction ^| . 
It assumes no correlations between energies of different 
sites. One can think of a more general form of potential 
profile 



P[U(x)} ex exp 



dydx U{x)G{x - y)U{y) 



(7) 

Taking for example, G(x — y) oc d% y 8{x — y), we ob- 
tain the Random-Force Model [2^| that describes an en- 
ergy landscape appearing as a random walk with linearly 
growing correlations. This model was studied during the 
last decades in the context of heteropolymer dynamics 
|24l f29| , glassy systems [3(J, |31| and quite recently - to 
describe DNA denaturation dynamics [32]. Characteris- 
tic features of the Random-Force Model are logarithmi- 
cally slow ("Sinai's") diffusion an( A aging [3lll3^ |. 
More generally, G is related to the correllator of U by 
(U{x)U{y)) = G-\x-y). 

To include finite-range correlations into Eq. 10, we 
must incorporate a limitation on the acceptable forces. 
The ensemble of energy profiles is therefore naturally de- 
scribed by the following probability density 



P[U(x)]cxe- n W, 



with pseudoenergy 



H[U] 



dx 



aU 2 {x) +7 



dU 
dx 



(8a) 



(8b) 



Energy level statistics for this kind of potential profile is 
also Gaussian, as can be seen from the average 



,ikU P -H[U] 



JV{U]e- H M 



exp 



M (9) 



8^/07/ 



which is the characteristic function for Gaussian distri- 
bution with zero mean and variance 



a 2 = 



1 



4^7 



(10) 



The correlator of the potential profile is readily calculated 

as 

g(r) ~([U(x) - U(x + r)} 2 ) = a 2 (l - e "H^) , (11) 
where £ c = \J '7/a is the correlation length. 
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B. Mean First Passage Time 

A convenient formalism for analyzing diffusion in a 
random one-dimensional potential profile is that of mean 
first-passage time [3113 ■ For a given set of probabilities 
{pi}, the mean first-passage time (MFPT) from i = to 
i = N (in terms of number of steps) is 



t ,N = N + 



N-l 

E 

fe=0 



N-2 N-l 

E E (i 

k=0 i=k+l 



UJk) 



n 

=fe+i 



(12) 



where u>i = qi/pi (see Appendix ^ for derivation). The 
MFPT given by this expression is for a fixed realization 
of probabilities, i.e. for a given potential energy profile; 
as such, it is itself a random variable. The disorder- 
averaged version of the MFPT is readily obtained after 
we note that the sequential products in Eq. l(T2|) reduce 
to 



f[ uj = exp \J3(Ui+i + Ut-U k - U k ..i)] 



(13) 



j=k 



For an uncorrelated potential profile, this exponential 
factorizes into independent exponentials; after the en- 
semble averaging and the summations are carried out, 
we obtain for N ^> 1 



(14) 



(15) 



(i , N ) = N 2 e 2 ^ 2 , 
where, for the uncorrelated potential (7 = 0) 

2ad' 

where d is the lattice spacing. Note that this expression 
cannot be obtained by simply putting 7 = in Eq. ((10(1 . 
The reason is that when 7 becomes small, the discrete 
nature of the underlying lattice starts to matter. The 
integration in the momentum space extends only up to 
|<7max| = tt/ d, and thus, 

" jd do 1 

= — . (16) 

-■Kid 47ra 2 "« 

Returning to the case of a finite correlation length, we 
note that in the limit of £ c S> d, variations of the potential 
between neighboring sites can be neglected compared to 
variations between sites separated by distances of order 
£ c or larger. Since the main contribution to the MFPT 
comes from the double sum in Eq. (|12|l . we can write the 
continuum version as 

f 0)iV ~2 / dx / dy<?K v W- u W>\ (17) 

JO Jx 

To average over all possible realizations of {U(x)}, we 
calculate 



~2| 



^2l3{U(y)-U{x)) p -H[U] 



-H[U] 



exp 



(1-e- 



-y|/^c^ 



(18) 



For \x — y\ <C £ C) Eq. l(TH)l reduces to exp(/3 2 |x — j/l/7), 
so that for N <C ^ c we have 



(io, N ) ~N 2 exp(4p 2 a 2 N/Z c ). 



(19) 



(Here and in what follows, we measure distances in units 
of d, unless specified otherwise.) This kind of exponential 
creep is quite expected, since for a — > 0, £ c — > 00 our 
model JHJ reduces to the Random-Force Model. 

In the opposite limit \x — y\ 3> ^c, we can neglect the 
exponent e~\ x ~ y][ ^ c , so that Eq. 1)17(1 produces an ordi- 
nary diffusion law, with a disorder-renormalizcd diffusion 
coefficient: 



(to 



N 



N 2 e 4 ^ 2 



(20) 



Comparing Eqs. 1(20(1 and 1(14(1 . we see that diffusion in 
a correlated potential profile proceeds more slowly than 
in an uncorrelated profile. It is straightforward to ob- 
tain an expression for the disorder-averaged MFPT for 
arbitrary correlation length. If we keep all four terms in 
the exponential in Eq. 1(13(1 while going to the continuum 
limit, we obtain 



o,jv 



r N r N 

2 dx dy e P(Uix+d)+U( X )-U(y)-U(y-d))_ 

JO Jx 

(21) 

Averaging this expression over the disorder as in Eq. 1(18(1 
yields for N > £ c 



7V 2 exp[2/3V(l + e- d /«<=)] 



(22) 



which has the obvious limits of Eqs. 1(14(1 and ((20(1 for 

£ c — * and £ c 3> d, respectively. 



III. TYPICAL VS AVERAGE 

Large deviations from the average are characteristic to 
many disordered systems. In this section, we therefore 
explore the typical properties of random walks as com- 
pared to the disorder-averaged ones. 

A. Quantifying fluctuations 

After the potential profile is generated (see Appendix 
EJl, we calculate the MFPT using Eq. QT^. Fig. Hi 
presents the mean first passage times calculated for var- 
ious realizations of U (x) at biologically relevant temper- 
ature (cr ~ fcgT). It is clear that although the ensemble- 
averaged MFPT does behave as prescribed by Eq. 1(22(1 . 
typical MFPT exhibits high variability from one profile 
to another. The stepwise shape of typical curves sug- 
gests that a random walk in such a profile consists of 
regions characterized by subdiffusion (vertical "steps") 
and superdiffusion (plateaus), appearing intermittently. 
Uncorrelated potential profiles, as Fig. 0Jd shows, also 
lead to a certain disorder-induced variability, though of 
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a considerably smaller magnitude. To quantify the sam- 
ple dependence of the MFPT, we calculate its variance 
over the ensemble of potential profiles. Fig. 03 presents 
the standard deviation in io t N &s a function of N for 
correlated as well as uncorrelated potential profiles. We 
observe that the variance scales as iV 3 for all profiles. 
This dependence can be obtained analytically in a quite 
straightforward fashion. Consider the average of the 
square of MFPT in a potential profile with correlation 
length £ c . The leading term is obviously iV 4 exp(8/3 2 cr 2 ) 
and it comes from independent {i,k,l,m}. The next 
largest contribution comes from terms with i — m or 
k = I. There are ~ -/V 3 of such terms, each contributing 
exp(12/3 2 cr 2 ). Next, we note that in order to make a con- 
tribution of the same order of magnitude, the two indexes 
(i, m or k, I) should not necessarily coincide exactly; it is 
sufficient that they are less than one correlation length 
apart. Hence, after the leading 0(N 4 ) term is cancelled 
by (io,Af) 2 > the variance is 

((AW) 2 ) ~ £ C N 3 exp(12/?V). (23) 

Similar reasoning yields for the uncorrelated case 

((Ai a:N ) 2 ) ~ iV 3 exp(6/3 2 CT 2 ). (24) 

We see that for given a and /3, the correlated energy 
landscape produces stronger fluctuations in MFPT than 
uncorrelated ones, in agreement with Fig.^J 

Comparing the expressions for the variance with the 
corresponding expressions for disorder-averaged MFPT, 
we see that for any temperature, there is a characteristic 
distance iV c , below which there is no self- averaging and 
the typical MFPT is determined by fluctuations. This 
length is 

N c ~ £ c e 4/3V2 (25) 

for correlated profiles, and 

N c ~ e 2 ^ 2 (26) 

for uncorrelated ones. This effect is akin to "freezing" 
in the Random- Energy Model j2(J: for low enough tem- 
peratures, typical passage times for distances below N c 
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FIG. 5: MFPT standard deviation for /3a — 1.0 for correlated 
and uncorrelated potential profiles. 

are dominated by high barriers. This is more pronounced 
for correlated profiles since in addition to stronger tem- 
perature dependence, there is amplification by a factor 
of ~ £ c , as sites within a correlation length give similar 
contributions. Figure demonstrates the lack of self- 
averaging for uncorrelated potential profiles at short dis- 
tances and low temperatures: the median MFPT (defined 
as the 50th percentile of a sample) shows large deviations 
from the average at distances shorter than N c and coin- 
cides with it at distances larger than N c . 
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FIG. 6: Median versus disorder-averaged (solid lines calcu- 
lated from Eq. i'2'21 ) MFPT. Median values were calculated for 
1000 realizations of potential profiles: (a) Correlated poten- 
tial profile with £ c = 20.0; (b) Uncorrelated potential profile. 
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FIG. 4: Mean First Passage Times: typical versus average. 
Thick solid lines are the result of averaging over 1000 realiza- 
tions of potential profiles (/3a = 1.0): (a) correlated profile 
with £ c = 40.0; (b) uncorrelated profile. 



B. Anomalous diffusion 

The lack of self-averaging in the region £ c <C N N c 
can be quantified by estimating the typical MFPT. Con- 
sider Eq. I)13[l for an uncorrelated potential and define 
the following coarsening procedure: Ui = Un + t^t+i- 
Then, in the "freezing regime," the double sum 

^Y^etpWUi-Uk)], (27) 

k i 

is dominated by (i,k) producing the largest exponent. 
For a finite sample {Ui} of size N and variance a 2 , 
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the corresponding sample {Ui} contains N/2 values dis- 
tributed with a. variance 2a 2 . The minimum and the 
maximum of {Ui} have therefore characteristic values 

of ±2a^J\n[N/ (2v / 27r)j, respectively. Thus, a typical 
MFPT for an uncorrelated potential reads 



t 0iN ~ exp 



4/3crJln 



N 



2V2vr 



(28) 



For the purposes of estimating the extreme values of a 
correlated energy landscape, the sample size is effectively 
reduced by a factor of ~ £ c , therefore, the extrema of 

{Ui} are approximately ±a^J 2 ln[iV/(£ c \/27r)]- Noting 
that sites within a correlation length around the extrema 
contribute similarly to the MFPT, for a correlated poten- 
tial we write 
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FIG. 8: Probability density functions for MFPT calculated 
for 100,000 uncorrelated profile realizations at (3a = 2. 



h,N ~ £ c exp 



4/3oi/21n- 



N 



(29) 



Figure [3 compares typical values of ?o,iV calculated from 
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FIG. 7: Typical MFPT for N < iV c at various values of (3a: 
(a) Uncorrelated potential profile; (b) Correlated potential 
profile with £ c = 10. Solid lines are the analytical estimates 
from Eqs. ^ and 



Eqs. (|28|l and (|29|l with numerically calculated median 
values of MFPT. We see that our analytical estimates 
produce a correct order of magnitude for io.iV- As ex- 
pected, for uncorrelated profiles, the agreement is better 
at lower temperatures; for higher temperatures, Eq. l(2"8|) 
is an underestimation since we do not include contri- 
butions from second-lowest, second- highest, etc., energy 
levels. Eq. (|29|) . on the other hand, turns out to be a 
slight overestimation, since we have replaced the average 
of ~ terms by their maximum value. 

Large difference between the median and the average 
values is a signature of a broad probability distribution. 
The insets of Fig. [8] present two probability density func- 
tions for MFPT, at JV « iV c and N > N c . For the 
short distance, the distribution is very broad and spans 
several orders of magnitude. For N ^> N c , the system is 
self-averaging, in the sense that the MFPT distribution 
is much narrower with almost coinciding median and av- 
erage values. 



C. Characteristics of random walk 

To complete the picture, we perform direct simulations 
of random walks in correlated and uncorrelated potential 
profiles; typical results are depicted in Fig. El One can 
see a clear qualitative difference between the two cases: 
random walks in the uncorrelated profile look very much 
like standard walks with pi = qi = 1/2, whereas mo- 
tion of a particle in a correlated profile has a somewhat 
different nature. 
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FIG. 9: Random walk in (a) uncorrelated, and (b) correlated 
with £ c = 20.0, potential energy profiles. 

As above, we see that macroscopic motion of a particle 
in a correlated potential consists of subdiffusive as well as 
superdiffusive segments. It also appears that the particle 
tends to be localized near the bottom of "valleys" of few 
£ c in extent, whereas in an uncorrelated profile, there are 
no preferable sites for localization. Obviously, when the 
time is measured in real-time units, rather than in num- 
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ber of steps, the particle is more likely to be found at the 
minima of the energy landscape in both cases. In terms of 
the number of steps though, all sites of the uncorrelated 
landscape are revisited more or less uniformly. 



IV. BIOLOGICAL IMPLICATIONS 

A. Transcription Factors 

Consider a DNA-binding protein searching for its tar- 
get site on the genome. As explained in the introduction, 
a correlated random energy landscape can arise from the 
interplay of sequence-dependent flexibility, and the bend- 
ing contribution to the total DNA-binding energy. Dif- 
fusion on such a landscape may then lead to localization 
in the energy 'valleys,' i.e. the protein will reside prefer- 
entially in specific (favorable) areas of the genome. Such 
nonuniform sampling has important implications for bi- 
ological strategies of transcription factor bindings: First, 
if a valley contains several binding sites, the rapid (su- 
perdiffusive) scanning of the valley leads to quick equili- 
bration between these sites (while equilibration for simi- 
larly spaced sites outside a valley will take much longer). 
This is important when the protein binds nearby sites 
with distinct binding energies, and the strongest one has 
to be occupied first to provide correct regulation (as in 
the case of the cro repressor). Second, several proteins 
bind their specific sites only when activated by ligands 
(e.g. PurR, GalS etc), spending the rest of the time 
in an inactive form "waiting" for the ligand. These pro- 
teins can benefit from staying close the site in the waiting 
mode, since they can then quickly find their target upon 
activation. 

One of the results of this study was that inhomo- 
geneities significantly reduce the overall diffusion rate, 
as in Eq. (|22|l . While this may be beneficial in confin- 
ing a protein to favorable regions, it severely restricts the 
ability to search large portions of the genome by one di- 
mensional diffusion. Since we argue that a portion of the 
inhomogeneity originates from variations in the bending 
energy of the DNA, a potential strategy is for the bind- 
ing protein to switch between two states which bend the 
DNA weakly or strongly. The weak bending state is sub- 
ject to reduced variations in the energy landscape and 
can diffusive more freely (search mode), compared to the 
strongly bending state which is more likely to be confined 
in the vicinity of favorable energy valleys (waiting mode) . 
One potential candidate for exploiting this strategy is 
the tertarmeric Lad protein that consists of two DNA- 
binding dimeric subunits. Each subunit binds DNA and 
bends it slightly; when both subunits are bound, DNA is 
deformed into an extended loop. Several experimental re- 
sults suggest that LacI binds DNA with only one subunit 
while searching for its target site ( "holding DNA with one 
arm"). Only when both both subunits find their site, 
the DNA is bent into a loop. Very few structural data 
are available for proteins bind to DNA non-specifically 



(search mode). The above strategy suggests that DNA 
is less deformed in such complexes. 

Another potential source for a correlated inhomoge- 
neous energy landscape is an extended protein-DNA in- 
terface with net interactions that are the sum of sev- 
eral local contributions. (The addition of such correlated 
contributions leads to a much larger variance of energy 
than if they were uncorrelated.) This can be a significant 
effect for large multi-protein complexes (such as poly- 
merases, TFIID, TFIIB complexes in yeast, etc.). To 
avoid slow-down by such inhomogeneities, protein com- 
plexes can avoid scanning DNA in the fully assembled 
state when the protein-DNA interface is extensive. In- 
dividual components of the complex can search for their 
sites independently, assembling the whole complex only 
on the right site. In fact, most of large protein-DNA 
complexes follow this strategy of assembly on the site, 
while many dimers and tetramers are assembled in the 
solution. 



B. Nucleosomes 

Other implications concern nucleosome positioning 
and dynamics. Wrapping of the DNA around these large 
multi-protein complexes is essential for packing DNA in 
the small volume of the cell nucleus. Nucleosomes, how- 
ever, prevent transcription factors and other proteins 
from accessing DNA. To allow a transcription factor to 
access its target, nucleosomes close to that site have to 
be removed from the DNA or re-positioned. While re- 
moval of nucleosomes is made by specific enzymes that 
chemically modify them (e.g. by histon methilation) , re- 
positioning relies in part on nucleosome mobility. In gen- 
eral, nucleosomes have to be (i) positioned at specified 
locations, and (ii) be able to move along the DNA in the 
vicinity to the initial placement site allowing access to 
this region of the DNA. 

Nucleosome positioning is determined by specific se- 
quences on the DNA. Such sequences are also known 
to provide DNA flexibility and/or internal curvature 
l3r|. As discussed above, local DNA flexibility and 
curvature create a correlated energy landscape for bind- 
ing. We suggest that inhomogeneous diffusion on such 
landscapes is an important element that provides both (i) 
preferential positioning of the nucleosomes due to DNA 
flexibility and curvature, and (ii) relatively rapid diffu- 
sion within the confines of the energy valley. Conversely, 
uncorrelated landscapes cannot achieve both objectives, 
since strong nucleosome binding sites prevent local dif- 
fusion along the DNA, while weak sites are not able to 
localize these proteins, lea ding to their random place- 
ment. In fact, experiments 36] have shown that nucleo- 
some positioning sites are extended and are fairly weak. 
Such structure of positioning sites creates an extended 
valley on the correlated binding landscape, supporting 
our hypothesis. 

This mechanism can also explain how certain proteins 
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(such as HMGB) can reposition nucleosomes by binding 
to the DNA in their proximity. It has been suggested 
that such proteins alter the local mechanical properties 
of the DNA (such as its flexibility, curvature, or super- 
coiling) leading to repositioning of the nucleosome |37| . 
If the nucleosome is indeed preferentially localized by be- 
ing trapped in a valley of the binding landscape, HMGB 
proteins may well alter the shape of the valley (e.g. by 
shrinking it on one side). Mobile nucleosomes, rapidly 
diffusing within the boundaries of the valley, will then 
reposition themselves in the new landscape. 



C. Translocation 

In Sec. II Bl we described how slow (activated) passage of 
ssDNA through a nanopore can be modeled by diffusion 
over a correlated landscape. In particular, we demon- 
strated that if there are inhomogeneities in the charge 
of the DNA inside the channel, there will be variations 
in the potential energy landscape that are proportional 
to the applied voltage difference V. There is in fact 
scant structural information about the reconfigurations 
of charges (both free and bound) as DNA passes through 
a channel. Examining the variations in the MFPT of 
DNA as a function of the applied voltage , may provide 
an indirect probe of any inhomogeneities in the charge 
passing through a channel. 



V. CONCLUSIONS 

We studied one-dimensional diffusion in a random en- 
ergy landscape with short-range correlations. We found 
that disorder with short correlation length £ c leads to 
a strong sample dependence of diffusion characteristics. 
The diffusive transport is influenced up to length scales 
exceeding £ c by orders of magnitude. Three diffusion 
regimes can be identified: 

1. For distances smaller than the correlation length 
(N <C £c)j the disorder-averaged Mean First Pas- 
sage Time (MFPT) is 

(i , N ) ~ AT 2 exp(4o 2 /3 2 A7£ c ). 

At biologically relevant temperatures, the N 2 fac- 
tor prevails; however, at low temperatures (ksT < 
2c/vsc)j we obtain exponential creep (Sinai's dif- 
fusion) . 

2. For distances N much larger than the characteris- 
tic value N c , MFPT exhibits some variability from 
sample to sample. However, the typical value of the 
MFPT is given by the disorder-averaged MFPT 



above N c . The characteristic distance N c equals 



N 2 exp [2/3^(1 



The variance of MFPT over the ensemble of poten- 
tial profile realizations scales as N 3 with distance 
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for correlated profiles and e 2/3 CT for un- 



corrected ones. 

3. In the intermediate case £ c <§; N <C N c , the 
disorder-averaged MFPT behaves as described by 
Eq. l(2"2"|). However, the MFPT distribution over 
the ensemble of profile realizations is much broader 
below N c than above it, as Fig. [^demonstrates. As 
a result, a typical sample yields diffusion times or- 
ders of magnitude shorter than the average. This 
effect can be qualitatively understood in terms of 
the Random Energy Model. Below N c , diffusion 
times are mostly influenced by high barriers and 
deep valleys that are at the extrema of energy land- 
scape histogram. The typical diffusion times are 
given by 



N 

to.N ~ exp 4p<74/ln — -= 
y 2y 2tt 

for an uncorrelated profile, and 



to,N 



£ exp 



4/3o-i / 2 In 



N 



for a correlated one. Above N c , most obstacles to 
the particle motion lie in the central region, so that 
Eq. (1221 produces a valid estimation for a typical 
diffusion time: the system becomes self-averaging. 

These regimes appear to be relevant for biological sys- 
tems and provide qualitative insight into the kinetics of 
protein-DNA interaction. 
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APPENDIX A: MEAN FIRST-PASSAGE TIME 
DERIVATION. 

The mean first passage time (MFPT) from site #0 
to site #7V is defined as the mean number of steps the 
particle has to make in order to reach site #iV for the first 
time. The derivation here follows the one in Ref . [3{| . 
Let Pij (n) denote the probability to start at site #z 

Then, for 



and to reach the site #j in exactly n steps, 
example, 

Pi,i+i (n) = PiTt (ra - 1) , 



(Al) 



where Tj (n) is defined as the probability of returning to 
the i-th site after n steps without stepping to the right of 
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it. Now, all the paths contributing to Tj (n — 1) should 
start with the step to the left and then reach the site #z 
in n — 2 steps, not necessarily for the first time. Thus, 
the probability Ti (n — 1) can be written as 

Ti (n-l) = qj J~] Pj-i.j (m)Tj (0<Wi,n-2- (A2) 

/// ./ 

We now introduce generating functions 

oc oo 

hi (*) = E z " p « («) . £ (*) = E T * («) • 



n=0 



n=0 



One can easily show (see e.g. Ref. |38|) that 



JV-l 



Po,n (z) = Y[ Pi, i+ 1 (z) . 



(A3) 



(A4) 



Knowing (z), one calculates the MFPT straight- 

forwardly as 



0,N 



En nP 0.,N (n) 
N-l 

= E 

i=0 



dz 



lnP ,Ar (z) 



dz 



lnP iji+ i (z) 



(A5) 



Using Eqs. IjAlfl and l|A2() . we obtain the following recur- 
sion relation for Pj^+i (z): 



ZPi 



l - zqiPi-i,i (z) 



(A6) 



To solve for io,N, we must introduce boundary condi- 
tions. Let po = 1, (/o = 0, which is equivalent to intro- 
ducing a reflecting wall at i = 0. This boundary con- 
dition clearly influences the solution for short times and 
distances. However, as numerical simulations suggest, its 
influence relaxes quite fast, so that for longer times, the 
result is clearly independent of the boundary. The bene- 
fit of setting po = 1 becomes clear when we observe that 



Po,i (1) = 1, 
Hence, 



V i P M+ i (1) = 1. (A7) 



N-l 



E P U + i (!) 



(A8) 



i=0 



The recursion relation for P/ i+1 (1) is readily obtained 
from Eq. (|A6jl : 

Pl,i+i (1) = - + -Pt M (!) = ! + I 1 + Pi-i,i (1)" . 

Pi Pi L J 

(A9) 

with cjj = qi/pi- Thus, the expression for io,w is obtained 
in the closed form as 

N-l N-2 N-l i 

k.N = a^+e^+E E IT ^- ( A1 °) 

fc=0 fe=0 i=fe+l j=k+l 



APPENDIX B: POTENTIAL PROFILE 
GENERATION 



Given the pseudoenergy partition function 

Z{\) =Jv[U]e- XH V\ (Bl) 



the average pseudoenergy is 



a 



(H) = In 2(A) 



X=l 



and the variance is 



i:) 2 



((AH) 2 ) = (H 2 ) - (H) 2 = ^hiZ(X) 



(B2) 



(B3) 



Straightforward calculation for the pseudoenergy given 
by Eq. JHJ yields 



(H) = L/2, ((AH) 2 ) = L/2. 



(B4) 



Hence, typical potential profiles have pseudoenergies in 
the range L/2 ± y/L/2. This result together with Gaus- 
sian statistics of energy levels of Eq. © forms the basis 
of the algorithm we employ for building the energy pro- 
files. First, a random and uncorrelated potential profile 
obeying Gaussian statistics with the required variance a 2 
is generated on a one-dimensional lattice. Next, we look 
for a permutation of lattice sites that produces a typical 
pseudoenergy H[U] for a given correlation length £ c (or, 
equivalently, for given values of a and 7) . This is accom- 
plished by a Metropolis-type algorithm that converges 
to a prescribed value of pseudoenergy picked at random 
from Gaussian distribution around (H) ; see Fig. 1101 





4000 4500 5000 5500 



(b) 




5000 



5500 6000 

H[U] 



6500 



7000 



FIG. 10: Pseudoenergy probability density for a profile 
of length L = 10000, with a = 1.0, £ c = 20.0. Insets: 
(a) Typical potential profile; (b) Potential profile correllator 
g(r) = l/2([U(x) - U(x + r)] 2 }; the averaging was performed 
over 1000 profile realizations. 



10 



[1] J. D. Murray, Mathematical Biology (Springer- Verlag, 
2002). 

[2] C. Bustamante, D. Keller, and G. Oster, Acc. Chem. Res. 

34, 412 (2001). 
[3] R. D. Vale and R. A. Milligan, Science 288, 88 (2000). 
[4] R. D. Astumian, Science 276, 917 (1997). 
[5] D. K. Lubensky and D. R. Nelson, Biophys. J. 77, 1824 

(1999). 

[6] P. G. de Gennes, Physica A 274, 1 (1999). 

[7] H. Salman, D. Zbaida, Y. Rabin, D. Chatenay, and M. El- 
baum, Proc. Nat. Acad. Sci. USA 98, 7247 (2001). 

[8] A. Meller, L. Nivon, and D. Branton, Phys. Rev. Lett. 
86, 3435 (2001). 

[9] O. G. Berg, R. B. Winter, and P. H. von Hippel, Bio- 
chemistry 20, 6929 (1981). 
[10] P. H. von Hippel and O. G. Berg, J. Biol. Chem. 264, 
675 (1989). 

[11] N. Shimamoto, J. Biol. Chem. 274, 15293 (1999). 
[12] O. G. Berg and P. H. von Hippel, J. Mol. Biol. 193, 723 
(1987). 

[13] U. Gerland, J. D. Moroz, and T. Hwa, Proc. Nat. Acad. 

Sci. USA 99, 12015 (2002). 
[14] R. F. Bruinsma, Physica A 313, 211 (2002). 
[15] D. A. Erie, G. Yang, H. C. Schultz, and C. Bustamante, 

Science 266, 1562 (1994). 
[16] M. G. Munteanua, K. Vlahovicek, S. Parthasarathya, 

I. Simon, and S. Pongor, Trends Biochem. Sci. 23, 341 

(1998). 

[17] K. Vlahovicek, L. Kajan, and S. Pongor, Nucleic Acids 
Res. 13, 3686 (2003), http://www.icgeb.trieste.it/dna/. 

[18] I. Brukner, R. Sanchez, D. Suck, and S. Pongor, EMBO 
J. 14, 1812 (1995). 

[19] A. Gabrielian and S. Pongor, FEBS Lett. 393, 65 (1996). 

[20] H. Schiessel, J. Widom, R. F. Bruinsma, and W. M. Gel- 



bart, Phys. Rev. Lett. 86, 4414 (2001). 
[21] J. Widom, Annu. Rev. Biophys. Biomol. Struct. 27, 285 
(1998). 

[22] J. Widom, Quart. Rev. Biophys 34, 269 (2001). 

[23] R. Kiyama and E. N. Trifonov, FEBS Lett. 523, 7 (2002). 

[24] D. Cule and T. Hwa, Phys. Rev. Lett. 80, 3145 (1998). 

[25] A. Meller, J. Phys.: Condens. Matter 15, R581 (2003). 

[26] B. Derrida, Phys. Rev. B 24, 2613 (1981). 

[27] J. Bryngelson and P. Wolynes, Proc. Nat. Acad. Sci. USA 

84, 7524 (1987). 
[28] J. P. Bouchaud, A. Comtet, A. Georges, and P. Le Dous- 

sal, Ann. Phys 201, 285 (1990). 
[29] P. G. De Gennes, J. Stat. Phys 12, 463 (1975). 
[30] P. Le Doussal and V. M. Vinokur, Physica C 254, 63 

(1995). 

[31] D. S. Fisher, P. Le Doussal, and C. Monthus, Phys. Rev. 

Lett. 80, 3539 (1998). 
[32] T. Hwa, E. Marinari, K. Sneppen, and L. Tang, Proc. 

Nat. Acad. Sci. USA 100, 4411 (2003). 
[33] Y. G. Sinai, Theory Probab. Appl. 27, 247 (1982). 
[34] S. H. Noskowicz and I. Goldhirsh, Phys. Rev. Lett. 61, 

500 (1988). 

[35] K. P. N. Murthy and K. W. Kehr, Phys. Rev. A 40, 2082 
(1989). 

[36] A. Thastrom, P. T. Lowary, H. R. Widlund, H. Cao, M. 
Kubista, and J. Widom, J. Mol Biol. 288, 213 (1999). 

[37] A. A. Travers, EMBO Rep. 4, 131 (2003). 

[38] I. Goldhirsh and Y. Gefen, Phys. Rev. A 33, 2583 (1986). 

[39] DNA bending by transcription factors is a well-known 
phenomenon, though practically all the available exper- 
imental data focus on proteins bound to operator se- 
quences. 

[40] For a-haemolysin, the diameter of the limiting aperture 
is about 15 A. 



